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Abstract.  In  this  paper  we  discuss  the  design  and  implementation  of  a  Newton-Krylov-Schwarz  solver  for  the 
implicit  temporal  integration  on  an  unstructured  three-dimensional  spatial  mesh  of  Richards’  equation  for  groundwater 
flow  in  unsaturated  porous  media.  We  use  aggregation  techniques  from  the  algebraic  multigrid  literature  to  construct 
a  coarse  mesh  for  two-level  Schwarz  methods.  Our  coarse  mesh  differs  from  other  constructions  in  that  no  coarse 
mesh  geometry  need  be  created  and  we  do  not  need  geometric  information  about  the  subdomains.  We  report  on  a 
computational  example  to  illustrate  the  performance  of  the  preconditioner. 
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1.  Introduction.  In  this  paper  we  discuss  the  design  and  implementation  of  a  Newton-Krylov- 
Schwarz  solver  for  the  implicit  temporal  integration  on  an  unstructured  three-dimensional  spatial 
mesh  of  Richards’  equation  for  groundwater  flow  in  unsaturated  porous  media.  We  use  aggregation 
techniques  from  the  algebraic  multigrid  literature  [33]  to  construct  a  coarse  mesh  for  two-level 
Schwarz  methods.  Our  coarse  mesh  differs  from  other  constructions,  for  example  that  in  [9],  in 
that  no  coarse  mesh  geometry  need  be  created,  and  we  do  not  need  geometric  information  about 
the  subdomains. 

The  mixed  form  of  Richards’  Equation  is  [8] 

(1.1)  SsS[r)lJf  +  =  V  •  [Kskr  (iP)  V  ('(/’  +  z)]  +  W 

where  %i)  is  pressure  head;  Ss  is  the  specific  storage,  which  accounts  for  water  compressibility  and 
aquifer  elasticity;  S  ( A )  is  the  water  saturation  or  volumetric  fraction  of  pore  space  occupied  by 
water;  r)  is  the  porosity  or  volumetric  void  fraction;  Ks  is  the  water- saturated  hydraulic  conductiv¬ 
ity;  Ay  is  the  relative  permeability  of  the  media;  and  W  is  a  source/sink  term.  In  this  formulation, 
the  z  axis  is  the  vertical  direction  oriented  positively  upward.  Both  S  and  Ay  are  functions  of  v\ 
Ks  and  Ss  are  provided  as  data. 

The  constitutive  mode  for  saturation  is 

(1.2)  s  =  Sr  +  T7~~(  I  TTTHrmi  ^  <  0 

[1  +  (am)  ] 
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where  Sr  is  the  residual  saturation,  a  >  0  and  n  >  1  are  parameters  specific  to  the  fluid  and 
soil  [25],  and  m  =  1  —  1/n  [32].  The  relative  permeability  function  is  [25] 


(1.3) 


Ay  = 


[l  -  (« 


71  —  1 


[i  +  H^ir 


[i 


[a 


nim/2 


ip  <  0. 


Both  S'  and  Ay  are  not  differentiable  at  ip  =  0  when  1  <  n  <  2.  We  address  this  nonsmoothness 
and  the  high  cost  of  evaluation  of  the  nonlinearities  in  (1.2)  and  (1.3)  by  approximating  S  and  A:,, 
with  a  spline  [23, 29].  We  used  a  piecewise  linear  spline  in  the  computations  reported  here. 

For  the  sand,  silt,  and  clay  soils  used  in  the  computations  in  §  3,  we  use  parameters  from  the 
literature  [7, 21].  These  parameters  are  given  in  Table  1.1  and  the  saturation  and  relative  perme¬ 
ability  functions  are  plotted  in  Figure  1.1.  Note  the  strong  dependence  of  relative  permeability  on 
the  pressure  head  in  the  Figure  1.1. 


Table  1 . 1 

Typical  parameters  for  soil  textural  groups 


Soil 

Sr 

a 

n 

Textural  Group 

(-) 

(1/ft) 

(-) 

Sand 

0.105 

4.420 

2.68 

Silt 

0.074 

0.487 

1.37 

Clay 

0.179 

0.244 

1.09 

Fig.  1.1.  The  ip-S  (left)  and  ip-kr  ( right)  models  used  in  the  column  drainage  test. 


Implicit  temporal  integration  in  three  space  dimensions  leads  to  large  nonlinear  equations  at 
each  time  step.  Newton-Krylov-Schwarz  methods  solve  nonlinear  equations  by  using  Newton’s 
method  with  a  Schwarz  domain  decomposition  preconditioned  Krylov  method  to  approximate  the 
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Newton  step.  These  methods  have  been  used  in  computational  fluid  dynamics  for  some  time  [5, 6, 
20].  This  paper  is  one  of  the  first  to  apply  these  methods  to  Richards’  equation.  Multigrid  methods 
have  also  been  applied  to  Richards’  equation,  [10,28,34,35]. 

In  §  2  we  describe  the  nonlinear  and  linear  solver  issues  raised  by  Richards’  equation,  intro¬ 
duce  some  notation  from  [27]  for  domain  decomposition,  and  use  that  notation  to  describe  our 
approach  to  Schwarz  preconditioning. 

In  §  3  we  report  numerical  results  for  the  solution  of  a  finite  element  discretization  of  Richards’ 
equation  in  three  space  dimensions.  Our  numerical  experiments,  done  on  an  IBM  SP  at  the  North 
Carolina  Supercomputing  Center  (NCSC),  show  that  the  two-level  preconditioners  perform  well 
and  have  very  good  scalability. 

The  work  described  in  this  paper  is  part  of  the  development  of  the  Adaptive  Hydrology  model, 
or  ADH.  The  ADH  model  is  a  production  code  being  developed  at  WES  and  is  designed  for  the 
analysis  of  basin  scale  hydrologic  phenomena.  The  focus  of  this  paper  is  the  preconditioners  and 
we  use  only  fixed  (z.  e.  time  independent)  unstructured  meshes.  We  have  also  applied  these  ideas 
to  surface  water  flow  [17,18].  We  will  describe  adaptive  spatial  and  temporal  refinement  and  more 
complex  simulations  in  future  work. 

2.  Newton-Krylov-Schwarz  methods.  The  weak  formulation  of  Richards’  equation  leads  to 
finite  element  discretizations  that  are  implicit  in  time.  An  elliptic  partial  differential  equation  must 
be  solved  at  each  time  step.  After  discretization  one  obtains  a  large  system  of  nonlinear  equations. 
In  this  section  we  begin  with  a  description  of  inexact  Newton  methods  [11]  in  general  terms  and 
then  show  how  Newton-Krylov-Schwarz  methods  fit  in  that  framework. 

2.1.  Nonlinear  Solvers.  We  express  nonlinear  equations  in  the  general  form: 

(2.1)  F(i)  =  0. 

We  will  assume  that  a  solution  £*  exists  and  that  a  good  approximation  to  £*  is  available.  This 
latter  assumption  is  appropriate  in  the  context  of  implicit  temporal  integration,  where  £*  is  the 
solution  at  the  new  time  step  and  the  initial  approximation  (the  predictor)  is  an  interpolation  of 
solutions  at  previous  times.  We  let  F'(£)  denote  the  Jacobian  of  F  at  a  vector  £.  We  assume  that 
F'(£)  is  Lipshitz  continuous  in  £  and  F' ( £* )  is  nonsingular.  These  are  standard  assumptions  in  non¬ 
linear  equations  [19]  and  are  valid  for  the  differential  equations  under  consideration  in  this  paper. 
The  smoothness  assumptions  can  be  violated  by  the  nonlinearities  for  Richards’  equation  in  some 
cases  [24],  but  this  nonsmoothness  can  be  managed  by  approximation  with  a  well-chosen  spline. 
Even  the  nonsmooth  effects  of  a  piecewise  linear  spline,  which  we  use  in  our  implementation,  are 
benign. 

We  will  describe  Newton’s  method  for  (2.1)  in  the  standard  way  [12, 19]  in  terms  of  the  tran¬ 
sition  from  a  current  approximation  £c  to  the  solution  to  a  new  approximation  £+  and  describe 
the  convergence  in  terms  of  the  relative  sizes  of  the  new  error  e+  =  £+  —  and  the  current  error 
ec  =  £c  —  £*.  The  Newton  iteration  is 

(2-2)  Z+  =  tc-F'(Zcr1F(Zc)- 

In  (2.2)  F'(£c)  is  the  Jacobian  matrix  at  the  current  iteration.  Given  sufficiently  good  data  and 
sufficiently  smooth  nonlinearities  the  Newton  iteration  will  converge  quadratically,  i.  e. 

IMI=0(||ec||2), 
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where  e  =  £  —  £*.  In  temporal  integration  applications  the  nonlinear  iteration  is  terminated  when 

(2-3)  11^(011  <Ta  +  T,.||F(£0)||, 

where  r0  and  rr  are  absolute  and  relative  error  tolerances. 

One  does  not  compute  the  Newton  step  s  =  — -F'(£C)_1.F(£C)  by  using  the  inverse  of  the 
Jacobian  but  rather  by  solving  (perhaps  approximately)  the  linear  equation 

(2.4)  F'{Zc)s  =  -F(Q. 

For  problems  in  three  space  dimensions  the  use  of  a  direct  method  for  solving  (2.4)  is  im¬ 
practical  for  reasons  of  both  storage  and  computational  cost.  One  must  solve  (2.4)  by  an  iterative 
method.  It  is  common  [11,19, 26]  to  terminate  that  linear  iteration  when  the  relative  linear  residual 
is  small,  i.  e.  when 

(2-5)  ||i^c)s  +  F(£c)||  <77r||F(£c)|| 

for  some  small  r]r.  (2.5)  is  called  the  inexact  Newton  condition.  The  parameter  r)r  is  called  the 
forcing  term.  In  applications  to  temporal  integration,  one  can  use  absolute  residuals  and  gain  some 
efficiency  [3].  We  can  express  convergence  results  for  both  approaches  in  terms  of  the  termination 
condition 

(2-6)  \\F'^c)s  +  F^c)\\<Va  +  Vr\\F(Q\\. 

If  the  initial  iterate  is  near  the  solution,  the  nonlinearity  is  sufficiently  smooth,  and  (2.6)  holds 
for  some  7]a  and  0  <  rjr  <  1  then  the  error  in  £+  satisfies,  [3,11, 19], 

(2.7)  1 1  e+  1 1  =  O  ( 1 1  e<s'f2  +  Vr  1 1  dej  |  +  Va )  - 

Clearly,  if  £c  is  near  £*  and  t)r  is  sufficiently  small  the  iteration  will  converge  rapidly.  However, 
solving  the  equation  for  the  Newton  step,  (2.4),  to  very  high  accuracy  may  be  wasteful,  particularly 
if  the  initial  iterate  is  far  from  the  solution  [13, 19]. 

2.2.  The  Linear  Iteration  and  Preconditioning.  If  the  linear  equation  (2.4)  for  the  Newton 
step  is  solved  by  an  iterative  method,  the  overall  iteration  is  called  a  Newton-Iterative  method. 
The  iteration  (2.2)  is  called  the  nonlinear  iteration  or  the  outer  iteration ,  and  the  iteration  to  solve 
the  linear  equation  (2.4)  is  called  the  linear  iteration  or  the  inner  iteration.  If  a  Krylov  method  is 
used  as  the  linear  solver  the  combination  was  called  a  Newton-Krylov-Schwarz  method  in  [5]  and 
that  term  is  now  common.  In  this  work  we  use  the  Krylov  method  BiCGSTAB  [15, 19, 30]  as  the 
linear  solver  with  an  additive  Schwarz  preconditioner.  Other  low-storage  transpose-free  Krylov 
methods  for  nonsymmetric  methods  [14, 16]  were  tested  in  the  early  stages  of  this  project  along 
with  BiCGSTAB(2),  using  the  implementation  from  [31]. 

Preconditioning  (from  the  right  in  our  case),  which  is  critical  to  good  performance,  replaces 
(2.4)  with  the  equivalent  system 

(2.8)  F'(QMs  =  /'(<() 

and  then  sets  s  =  Ms.  The  preconditioning  operator  M  is  constructed  so  that  (2.8)  is  easier  to 
solve  than  (2.4). 
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2.3.  Schwarz  Preconditioners.  The  discretized  elliptic  problems  that  must  be  solved  at  each 
time  step  have  the  general  form 

(2.9)  TZ(Uh.  v )  =  0,  for  all  v  £  Vh 

where  7Z  is  the  weak  form  of  the  nonlinear  equation,  Vh  is  the  space  of  test  functions  and  If  £  Vh. 
TZ  is  linear  in  the  second  argument  and  nonlinear  in  the  first.  Vh  is  the  set  of  real-valued  piecewise 
linear  functions  on  the  unstructured  spatial  mesh. 

We  begin  by  splitting  the  original  physical  domain  O  into  subdomains  i  1,  and  restricting  the  ac¬ 
tion  of  the  differential  operator  to  the  subdomains.  The  division  leads  to  the  creation  of  new  bound¬ 
ary  pieces,  called  artificial  boundaries,  for  the  subdomains.  Figure  1.1  depicts  a  one-dimensional 
domain  fi  that  is  split  into  two  subdomains,  Q|  and  Q2.  The  artificial  boundaries  created  by  the 
split  are  Ti  and  T2,  with  Ti  part  of  the  boundary  for  D|  and  T2  part  of  the  boundary  for  Q2. 


Fig.  2.1.  Subdomain  Splits  in  One-Dimension  for  Two  Subdomains 
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When  creating  the  subdomains  from  the  original  domain,  one  can  require  that  the  subdomains 
share  an  artificial  boundary  or  one  can  allow  the  subdomains  to  overlap,  as  depicted  in  Figure  2.1. 
Subdomains  overlap  if  part  or  all  of  the  artificial  boundary  for  one  subdomain  lies  in  the  interior 
of  an  adjacent  subdomain.  In  Figure  2.1,  the  artificial  boundary  Fi  lies  in  the  interior  of  the 
subdomain  fl2,  and  similarly  F2  lies  in  fii.  In  ADH,  the  domains  overlap  with  an  overlap  width  of 
h,  the  width  of  a  single  element.  Increased  overlap  is  difficult  to  achieve  when  using  unstructured 
meshes  and  we  have  not  attempted  to  incorporate  it.  The  choice  of  overlapping  subdomains  led  to 
the  use  of  Schwarz-type  preconditioners.  General  discussions  of  Schwarz  preconditioners  may  be 
found  in  [27]. 

The  closure  of  the  differential  operator  restricted  to  the  subdomains  requires  that  boundary 
conditions  be  placed  on  the  artificial  boundaries,  as  these  boundaries  were  not  part  of  the  original 
problem  formulation.  Schwarz  methods  use  zero  Dirichlet  boundary  conditions  on  the  artificial 
boundaries  [27]  because  the  algorithms  incorporate  error  corrections  on  the  artificial  boundaries 
during  the  subdomain  solves. 

The  one-level  additive  Schwarz  preconditioner  is  a  block  Jacobi  preconditioner.  Let  A  be 
the  discretization  of  the  differential  operator,  and  assume  that  the  number  of  nodes  of  the  discrete 
problem  included  in  Q,  is  n, .  Define  the  matrix  R,  to  be  the  (discrete)  restriction  operator  for 
subdomain  i,  i.e.,  R,,  =  [  0  I  0  ]  where  /  is  of  size  n,  x  n,.  If 

Bi  =  Rj  (. RtARjylRt 
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then  the  one-level  additive  Schwarz  preconditioner  is 
(2.10)  M  =  £  A 

i—  1 

where  p  is  the  number  of  subdomains. 

This  preconditioner  is  implemented  readily  on  a  multiprocessor  computer  by  assigning  one  or 
more  subdomains  to  a  processor.  Because  the  overlap  between  subdomains  is  minimal  and  and  zero 
Dirichlet  conditions  are  imposed  on  the  artificial  boundaries,  there  is  no  need  for  communication 
after  the  application  of  the  one-level  additive  Schwarz  preconditioner. 

We  found  that  the  performance  of  the  one-level  method  is  inferior  to  two-level  methods,  which 
we  describe  now.  A  two-level  method  requires  a  coarse  mesh  for  either  the  entire  computational 
domain  or  a  coarse  resolution  of  the  entire  domain. 

Similarly  to  the  one  level  preconditioner,  we  define  a  coarse  mesh  restriction  operator  R0  and 

let  B0  =  Rq  (Ro  AR'o  )  Rq.  The  two-level  additive  Schwarz  preconditioner  is 

(2.H)  m  Bo  +  £  n, 

2=1 

and  the  two-level  hybrid  II  preconditioner  [22, 27]  is  given  by 

(2.12)  M  =  Bo  +  (I-BoA)j^Bl. 

2=1 

In  the  work  reported  here  we  found  that  the  alternative  form 

(2.13)  M  =  BQ  +  Y,Bi{I-ABQ) 

2=1 

performed  better. 

To  avoid  generating  and  storing  a  separate  coarse  mesh,  we  define  coarse  mesh  basis  functions 
as  aggregates  of  already  existing  fine  mesh  basis  functions,  an  idea  from  algebraic  multigrid  [33]. 
One-dimensional  examples  of  these  coarse  mesh  basis  functions  are  depicted  in  Figure  2.2,  with 
one  coarse  mesh  basis  function  defined  per  subdomain. 


Fig.  2.2.  Coarse  Mesh  Basis  Function  in  One-Dimension 


The  coarse  mesh  basis  functions  are  formed  by  summing  the  fine  mesh  functions  having  sup¬ 
port  in  a  given  subdomain.  If  {iy}  is  the  nodal  basis  for  Vh  and  {77/}  is  the  set  of  nodes  in 
subdomain  I,  then  the  coarse  mesh  functions  V}  are  formed  by 

v,  =  Y. 

ieDj 
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The  coarse  mesh  inverse  B0  can  be  constructed  from  the  fine  mesh  Jacobian  in  a  direct  way. 

Bo  =  Rq  [RoJ  (Uh,  Rq  j  R0 

where  J  is  the  Jacobian  of  7 Z  and  Rt  is  defined  by  its  action  on  a  vector,  i.e., 

(Rj,u) j  y '  ii 

*££>/ 

2.4.  Parallel  implementation.  We  assign  one  or  more  subdomains  to  each  processor.  Ele¬ 
ments  along  the  interprocessor  boundaries  are  shared  by  those  processors  owning  any  of  its  nodes. 
The  nodal  information  for  the  boundary  elements  must  be  communicated  among  processors  so  that 
each  processor  sees  all  of  the  shared  elements. 

The  linear  solver  without  preconditioning  requires  two  types  of  communication:  (1)  an  update 
of  the  nodal  values  along  processor  edges  for  each  matrix  vector  product  and  (2)  a  global  sum 
for  each  inner  product.  The  additive  Schwarz  /  block  Jacobi  preconditioning  does  not  require 
any  communication.  The  coarse  mesh  preconditioning  also  requires  two  types  of  communication. 
Initially,  each  processor  forms  part  of  the  coarse  matrix,  and  a  global  communication  is  used  to 
assemble  these  parts  on  all  of  the  processors.  The  coarse  mesh  problem  is  then  solved  redundantly 
on  every  processor.  Each  application  of  the  preconditioner  also  requires  the  communication  of  the 
residual  vector.  Each  processor  sums  its  pieces  of  the  residual  vector,  and  a  global  communication 
is  used  to  pass  these  pieces  to  all  of  the  processors.  The  coarse  preconditioner  is  then  applied  to 
the  reduced  residual  vector.  Each  processor  then  expands  the  appropriate  portion  of  the  reduced 
vector  and  updates  its  portion  of  the  full  residual  vector. 

3.  Numerical  Results.  In  this  section  we  report  on  a  simulation  of  a  heterogeneous  column 
experiment.  The  physical  problem,  pictured  in  Figure  3. 1,  is  a  column  filled  with  a  mixture  of  clay, 
silt,  and  sand.  The  column  is  primarily  sand,  with  a  clay  lens  near  the  bottom  of  the  column  and  silt 
lenses  in  several  places  throughout  the  column.  Boundary  conditions  applied  to  the  column  were 
prescribed  total  head  (pressure  head  plus  elevation  from  a  datum)  at  the  bottom  and  no-water- 
flux  boundaries  on  the  sides  and  top.  The  no  flow  condition  imposed  on  the  top  of  the  column 
means  that  there  is  no  flow  from  the  water  phase  through  the  top  of  the  column.  Air  is  assumed 
to  be  present  everywhere  and  will  enter  the  column  onasce  the  water  is  removed  from  the  pore 
spaces.  Initially,  the  column  is  water  saturated.  At  the  start  of  the  simulation,  the  bottom  boundary 
condition  is  gradually  modified  to  atmospheric  pressure  head  and  the  column  was  allowed  to  drain 
by  gravity  for  100  seconds. 
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FlG.  3.1.  3D  Heterogeneous  Column 


Sand 


Silt 


Clay 


We  discretized  the  equation  on  an  unstructured  tetrehedral  mesh.  We  used  the  piecewise  con¬ 
stant  in  time  and  piecewise  linear  in  space  finite  element  discretizations  from  [1],  The  residual 
formulation  of  Richards’  equation  is 


(3.1) 


R  (ip,w)  =  JQnR{ip)whdQ 

Iq,  {s*s (vo  ¥  +  »'*  <‘Q  ~  /< 


Vwh  :  [Kskr  (V)  V  (y  +  z)\  dQ 

fnn  -{ss{(s  (iph)  -  (s  (i>h)  v>*)n]  +  V  [s  (*Ph)+n  -  S  (V)  J }  <m 

fpn  [wh  ■  Kskr  (xph)  V  (i’h  +  z)]  ■  h  dP  +  fQn  wh  ■  W  dQ 


In  (3.1)  U  is  the  column,  T  the  boundary,  Qn  =  Q,  x  (tn, tn+ 1],  and  Pn  =  T  x  (fn,  tn+ 1].  Pn 
denotes  the  part  of  Pn  where  Neumann  conditions  are  applied. 

We  report  on  two  discretizations  of  the  column,  one  roughly  8  times  the  size  of  the  other. 
In  this  way  h,  the  typical  cell  diameter  for  the  mesh,  and  H,  the  typical  subdomain  size,  are 
both  approximately  halved  as  the  mesh  is  refined.  This  allows  us  to  gauge  the  scalability  of  the 
iteration.  The  coarse  mesh  was  generated  automatically  and  refined  by  hand,  (roughly)  halving 
the  mesh  width  in  each  of  the  x,  y,  and  z  directions.  The  small  mesh  has  5881  nodes  and  30720 
elements  and  the  large  mesh  has  43889  nodes  and  245760  elements.  The  meshes  were  generated 
using  the  Groundwater  Modeling  System  (GMS)  [2]  and  subdomains  were  constructed  using  the 
node  ordering  from  GMS. 

We  terminate  the  linear  iteration  using  (2.6),  the  l°°  norm,  and  [rja,rjr\  =  [10_7,0].  We 
terminate  the  nonlinear  iteration  when  (2.3)  holds  with  [ra,rr]  =  [10_5,0].  In  the  experiments 
reported  here  only  one  nonlinear  iteration  was  needed  at  each  time  step. 

Updating  preconditioning  information  infrequently  in  a  temporal  integration  is  common  prac¬ 
tice.  In  codes  like  DASPK,  [4],  for  example,  the  decision  to  update  preconditioning  information  at 
a  given  time  step  can  be  tied  to  the  performance  of  the  linear  or  nonhnear  solver.  A  two-level  pre¬ 
conditioner  has  two  parts,  the  subdomain  solvers  and  the  coarse  mesh  work,  i.  e.  computation  and 
factorization  of  the  coarse  mesh  matrix.  In  this  computation  we  found  that  the  coarse  mesh  work 
scaled  poorly.  The  reasons  for  this  are  that  each  processor  must  compute  a  part  of  the  coarse  mesh 
matrix  and  communicate  it  to  the  others  and  that  a  dense  matrix  LU  factorization  was  used  because 
we  had  no  way,  other  than  brute  force,  to  obtain  sparsity  information  about  the  coarse  mesh  matrix. 
We  reduced  computation  time  by  20%  by  not  performing  the  coarse  mesh  work  at  every  time  step. 
In  the  computations  reported  here,  the  coarse  mesh  work  was  done  every  10  nonlinear  iterations. 
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A  more  sophisticated  strategy  for  updating  both  components  of  the  preconditioner  is  needed  and 
will  be  the  subject  of  future  research. 

In  Table  3.1  we  report  on  the  number  of  linear  iterations  for  the  entire  simulation  (LIC)  for 
four  preconditioners:  point  Jacobi,  one-  and  two-level  additive  Schwarz,  and  the  two-level  hybrid 
II  preconditioner  given  by  (2.13).  We  give  the  time  in  seconds  for  the  entire  simulation  (TS)  and 
the  cumulative  time  for  the  linear  solver  (TL).  The  final  row  in  the  table  is  the  ratio  of  iteration 
counts  for  the  two  problems,  which  we  use  as  a  measure  of  scalability. 

Table  3.1  shows  that  the  one-level  preconditioner  is  worse  than  point  Jacobi  in  terms  of  com¬ 
putational  time  and  that  both  the  two-level  and  the  hybrid  two-level  preconditioner  are  far  better  in 
terms  of  both  number  of  iterations  and  computer  time. 


Table  3.1 

3D  Heterogeneous  Column  Results 


Small:  5  PEs,  8  blocks 

Point  Jacobi  One-Level 

Two-Level 

Hybrid 

LI 

13364 

8292 

2137 

1493 

TS 

84 

133 

74 

71 

TL 

50 

99 

32 

30 

Large:  40  PEs,  8  blocks 

LI 

19394 

16783 

2688 

1848 

TS 

184 

275 

107 

106 

TL 

130 

235 

63 

55 

Ratio 

1.45 

2.02 

1.26 

1.24 

The  computations  reported  here  were  performed  on  an  IBM  SP  with  177  200Mhz  Power3 
processors  on  2-way  SMP  nodes  with  1Gb  of  memory  per  node  mnning  IBM  AIX  Version  4.3, 
IBM  PE  2.4,  and  IBM  C  for  AIX  Version  4.4. 

The  results  of  the  test  problems  show  that  the  incorporation  of  a  coarse  mesh  problem  into 
the  preconditioner  is  necessary  to  obtain  a  significant  reduction  in  the  computational  time.  The 
aggregate  elements  appear  to  have  been  a  sensible  choice  for  the  coarse  mesh  problem  and  were 
easy  to  implement. 
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